Longitudinal Analysis of Electronic Health Information to Identify Possible COVID-19 Sequelae

Ongoing symptoms might follow acute COVID-19. Using electronic health information, we compared pre‒ and post‒COVID-19 diagnostic codes to identify symptoms that had higher encounter incidence in the post‒COVID-19 period as sequelae. This method can be used for hypothesis generation and ongoing monitoring of sequelae of COVID-19 and future emerging diseases.


DISPATCHES
These authors contributed equally to this article.
Ongoing symptoms might follow acute COVID-19. Using electronic health information, we compared pre-and post-COVID-19 diagnostic codes to identify symptoms that had higher encounter incidence in the post-COV-ID-19 period as sequelae. This method can be used for hypothesis generation and ongoing monitoring of sequelae of COVID-19 and future emerging diseases.
Because the day of reported diagnosis is only known to have occurred sometime between day of admission and day of discharge, we assigned a specific day (assigned randomly over the encounter duration) as the day of diagnosis for analysis. We generated 5 versions of the dataset with imputed diagnosis dates to capture this uncertainty. To compare diagnosis rates for the pre-and post-COVID-19 intervals, we used 1-sided t-tests of the equality of rates performed on a log RR scale (9). We limited analyses to ICD-10-CM codes that occurred in >5 encounters during >1 post-COVID-19 interval because of difficulty in interpreting RR for rare events (Appendix).
We evaluated whether RR was >1 in the postversus pre-COVID-19 intervals by using a t-test that includes variability due to multiple imputations (10). We report results significant at p<0.05 after performing the Benjamini-Hochberg adjustment procedure that excludes marginally significant results that could have occurred by chance because of performing a large number of significance tests. We performed analyses in R 3.6.0 (The R Foundation for Statistical Computing, https://cran.r-project.org/bin/windows/base/ old/3.6.0). We defined diagnoses with significantly increased encounter rates >60 days after COVID-19 index date relative to pre-COVID-19 as possible postacute sequelae.
Encounters for sequelae of specified infectious and parasitic diseases were increased at least through 149 days after the index date (RR 11.6 at 120-149 days) (Table). Encounters were increased for several months after acute illness for postviral fatigue syndrome, headache, and certain respiratory diseases, including pneumonia and acute respiratory distress syndrome. We identified general sequelae of treatment in intensive care, including polyneuropathy (RR 9.1 at 90-119 days) and myopathy (RR 5.0 at 60-89 days), nonscarring hair loss (RR 2.3-3.5 in multiple intervals beyond 60 days), and pressure ulcers (stage 3 and 4, RR 1.6-1.7 at 60-89 days).
Viral cardiomyopathy (RR 9.8 at 60-89 days) and sepsis codes were only increased in the first 90 days after index date. Rates of nonfollicular diffuse lymphomas were increased in the 60-119 day periods (RR 272.6-411), but most encounters were for 1 patient. Encounters for stage 3 chronic kidney disease (RR 2.5-6.4 beyond 60 days) and for increased liver aminotransferase levels (RR 4.8-6.5 beyond 60 days) were higher for several months after the index date; infective myocarditis (RR 12.6) was increased for 90-119 days.
The possible cardiac, respiratory, kidney, and liver sequelae identified through this method are consistent with those of previous studies (11)(12)(13). For kidney injury, new diagnoses of stage 3 kidney illness (glomerular filtration rate 30-59 mL/min/1.73 m 2 ) were higher than pre-COVID-19. Stage 3 kidney injury might occur when there is more permanent damage requiring repeated healthcare encounters. This method might generate useful hypotheses about the duration of possible sequelae because we found that encounters for increased aminotransferase levels remain increased at least through the 120-149-day interval after acute illness.
The first limitation of our study is that increased encounter rates might be caused by health-seeking behavior. Encounters for new diagnoses are not equivalent to new disease entities, and rates of encounter diagnosis codes might not represent the rates of disease. For long hospitalizations, diagnosis timing might be mischaracterized because actual diagnosis date is uncertain; however, we used imputation to account for this uncertainty. Counting the initial COVID-19 visit as part of the post-COVID-19 period might identify complications of acute illness as sequelae; however, we focused on sequelae with increased rates >60 days after the COVID-19 index date to mitigate that factor. This analysis does not capture exacerbations of underlying conditions, such as worsening heart failure or reactive airway disease, unless disease exacerbation is captured by a different diagnosis code.
The data in this analysis are more representative of adults than children. We excluded pregnancy-related conditions from the analysis. Our findings might not be generalizable to patients with asymptomatic or mild COVID-19 who might not seek healthcare, and we did not control for factors such as aging and changes in societal behavior, so we cannot attribute increased rates of new diagnoses solely to COVID-19. Advantages of our method include rapid application to large longitudinal healthcare datasets and extension over time to identify possible sequelae occurring long after acute illness.

Conclusions
Our findings are consistent with those of other studies using different methods to identify sequelae, including a matched cohort analysis of PHD-SR during the same period and a direct survey of persons with and without previous SARS-CoV-2 test results (14,15). This hypothesis-generating method can provide early signals of possible sequelae for novel diseases and inform additional studies to identify, characterize, and refine potential sequelae for COVID-19 or other emerging diseases.